MPReplayBuffer

class cpprb.MPReplayBuffer(size, env_dict=None, default_dtype=None, *, logger=None, **kwargs)

Bases: object

Multi-process support Replay Buffer class to store transitions and to sample them randomly.

This class works on multi-process without manual locking of entire buffer.

The transition can contain anything compatible with numpy data type. User can specify by env_dict parameters at constructor freely.

The possible standard transition contains observation (obs), action (act), reward (rew), the next observation (next_obs), and done (done).

>>> env_dict = {"obs": {"shape": (4,4)},
                "act": {"shape": 3, "dtype": np.int16},
                "rew": {},
                "next_obs": {"shape": (4,4)},
                "done": {}}

In this class, sampling is random sampling and the same transition can be chosen multiple times.

Notes

This class assumes single learner (sample) and multiple explorers (add) like Ape-X

Methods Summary

add(self, **kwargs)

Add transition(s) into replay buffer.

clear(self)

Clear replay buffer.

get_all_transitions(self, bool shuffle)

Get all transitions stored in replay buffer.

get_buffer_size(self)

Get buffer size

get_next_index(self)

Get the next index to store

get_stored_size(self)

Get stored size

is_Nstep(self)

Get whether use Nstep or not

on_episode_end(self)

Call on episode end

sample(self, batch_size)

Sample the stored transitions randomly with speciped size

Methods Documentation

add(self, **kwargs)

Add transition(s) into replay buffer.

Multple sets of transitions can be added simultaneously. This method can be called from multiple explorer processes without manual lock.

Parameters

**kwargs (array like or float or int) – Transitions to be stored.

Returns

The first index of stored position. If all transitions are stored into NstepBuffer and no transtions are stored into the main buffer, None is returned.

Return type

int or None

Raises

KeyError – If any values defined at constructor are missing.

Warning

All values must be passed by key-value style (keyword arguments). It is user responsibility that all the values have the same step-size.

clear(self)void

Clear replay buffer.

Set index and stored_size to 0.

Example

>>> rb = ReplayBuffer(5,{"done",{}})
>>> rb.add(1)
>>> rb.get_stored_size()
1
>>> rb.get_next_index()
1
>>> rb.clear()
>>> rb.get_stored_size()
0
>>> rb.get_next_index()
0
get_all_transitions(self, bool shuffle: bool = False)

Get all transitions stored in replay buffer.

Parameters

shuffle (bool, optional) – When True, transitions are shuffled. The default value is False.

Returns

transitions – All transitions stored in this replay buffer.

Return type

dict of numpy.ndarray

get_buffer_size(self)size_t

Get buffer size

Returns

buffer size

Return type

size_t

get_next_index(self)size_t

Get the next index to store

Returns

the next index to store

Return type

size_t

get_stored_size(self)size_t

Get stored size

Returns

stored size

Return type

size_t

is_Nstep(self)bool

Get whether use Nstep or not

Returns

use_nstep

Return type

bool

on_episode_end(self)void

Call on episode end

Finalize the current episode by moving remaining Nstep buffer transitions, evacuating overlapped data for memory compression features, and resetting episode length.

Notes

Calling this function at episode end is the user responsibility, since episode exploration can be terminated at certain length even though any done flags from environment is not set.

sample(self, batch_size)

Sample the stored transitions randomly with speciped size

This method can be called from a single learner process.

Parameters

batch_size (int) – sampled batch size

Returns

sample – Batch size of sampled transitions, which might contains the same transition multiple times.

Return type

dict of ndarray

__init__()

Initialize ReplayBuffer

Parameters
  • size (int) – buffer size

  • env_dict (dict of dict, optional) – dictionary specifying environments. The keies of env_dict become environment names. The values of env_dict, which are also dict, defines “shape” (default 1) and “dtypes” (fallback to default_dtype)

  • default_dtype (numpy.dtype, optional) – fallback dtype for not specified in env_dict. default is numpy.single

_encode_sample(self, idx)
add(self, **kwargs)

Add transition(s) into replay buffer.

Multple sets of transitions can be added simultaneously. This method can be called from multiple explorer processes without manual lock.

Parameters

**kwargs (array like or float or int) – Transitions to be stored.

Returns

The first index of stored position. If all transitions are stored into NstepBuffer and no transtions are stored into the main buffer, None is returned.

Return type

int or None

Raises

KeyError – If any values defined at constructor are missing.

Warning

All values must be passed by key-value style (keyword arguments). It is user responsibility that all the values have the same step-size.

clear(self)void

Clear replay buffer.

Set index and stored_size to 0.

Example

>>> rb = ReplayBuffer(5,{"done",{}})
>>> rb.add(1)
>>> rb.get_stored_size()
1
>>> rb.get_next_index()
1
>>> rb.clear()
>>> rb.get_stored_size()
0
>>> rb.get_next_index()
0
get_all_transitions(self, bool shuffle: bool = False)

Get all transitions stored in replay buffer.

Parameters

shuffle (bool, optional) – When True, transitions are shuffled. The default value is False.

Returns

transitions – All transitions stored in this replay buffer.

Return type

dict of numpy.ndarray

get_buffer_size(self)size_t

Get buffer size

Returns

buffer size

Return type

size_t

get_next_index(self)size_t

Get the next index to store

Returns

the next index to store

Return type

size_t

get_stored_size(self)size_t

Get stored size

Returns

stored size

Return type

size_t

is_Nstep(self)bool

Get whether use Nstep or not

Returns

use_nstep

Return type

bool

on_episode_end(self)void

Call on episode end

Finalize the current episode by moving remaining Nstep buffer transitions, evacuating overlapped data for memory compression features, and resetting episode length.

Notes

Calling this function at episode end is the user responsibility, since episode exploration can be terminated at certain length even though any done flags from environment is not set.

sample(self, batch_size)

Sample the stored transitions randomly with speciped size

This method can be called from a single learner process.

Parameters

batch_size (int) – sampled batch size

Returns

sample – Batch size of sampled transitions, which might contains the same transition multiple times.

Return type

dict of ndarray